Diego Garcia
- Asia > British Indian Ocean Territory > Diego Garcia (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > United Kingdom (0.04)
- (2 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Asia > British Indian Ocean Territory > Diego Garcia (0.04)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
- Research Report (0.46)
- Instructional Material > Course Syllabus & Notes (0.46)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)
Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection
Blaschke, Verena, Körner, Felicia, Plank, Barbara
Slot and intent detection (SID) is a classic natural language understanding task. Despite this, research has only more recently begun focusing on SID for dialectal and colloquial varieties. Many approaches for low-resource scenarios have not yet been applied to dialectal SID data, or compared to each other on the same datasets. We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model. We find noise injection to be beneficial while the effects of auxiliary tasks are mixed. Though some experimentation was required to successfully assemble a model from layers, it worked surprisingly well; a combination of models trained on English and small amounts of dialectal data produced the most robust slot predictions. Our best models achieve 97.6% intent accuracy and 85.6% slot F1 in the shared task.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- (19 more...)
Revisiting In-Context Learning with Long Context Language Models
Baek, Jinheon, Lee, Sun Jae, Gupta, Prakhar, Oh, Geunseob, Dalmia, Siddharth, Kolhar, Prateek
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we find that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.
- Europe > Austria > Vienna (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (8 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
CLIX: Cross-Lingual Explanations of Idiomatic Expressions
Gluck, Aaron, von der Wense, Katharina, Pacheco, Maria
One of that obtaining explanations of idiomatic expressions the main areas of interest for the proponents of in the learner's first language removes many technology-assisted language learning is vocabulary of the barriers to understanding introduced by traditional expansion, where recent studies have demonstrated definition generation systems. We choose a significant impact in student engagement to focus on idiomatic expressions as they are an and increased vocabulary knowledge (Fisher, 2016; important element of language learning that is particularly Guaqueta and Castro-Gárces, 2018; Tao Hao and challenging for learners and automated Ardasheva, 2021). To support the development systems alike. Consider the utterance, he and I of these technologies, considerable work has been don't see eye to eye on a variety of topics. The idiomatic devoted to the study of automated definition generation expression contained within this sentence (Ni and Wang, 2017; Gadetsky et al., 2018; is not composed of particularly challenging words, Ishiwatari et al., 2019; Bevilacqua et al., 2020).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America (0.04)
- North America > United States > New York (0.04)
- (14 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
Chen, Zhe, Liao, Yusheng, Jiang, Shuyang, Wang, Pingjie, Guo, Yiqiu, Wang, Yanfeng, Wang, Yu
Large language models (LLMs) hold promise for addressing healthcare challenges but often generate hallucinations due to limited integration of medical knowledge. Incorporating external medical knowledge is therefore critical, especially considering the breadth and complexity of medical content, which necessitates effective multi-source knowledge acquisition. We address this challenge by framing it as a source planning problem, where the task is to formulate context-appropriate queries tailored to the attributes of diverse knowledge sources. Existing approaches either overlook source planning or fail to achieve it effectively due to misalignment between the model's expectation of the sources and their actual content. To bridge this gap, we present MedOmniKB, a comprehensive repository comprising multigenre and multi-structured medical knowledge sources. Leveraging these sources, we propose the Source Planning Optimisation (SPO) method, which enhances multi-source utilisation through explicit planning optimisation. Our approach involves enabling an expert model to explore and evaluate potential plans while training a smaller model to learn source alignment using positive and negative planning samples. Experimental results demonstrate that our method substantially improves multi-source planning performance, enabling the optimised small model to achieve state-of-the-art results in leveraging diverse medical knowledge sources.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- (8 more...)
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Cheng, Xiaoxue, Li, Junyi, Zhao, Wayne Xin, Wen, Ji-Rong
Large language models (LLMs) demonstrate exceptional capabilities, yet still face the hallucination issue. Typical text generation approaches adopt an auto-regressive generation without deliberate reasoning, which often results in untrustworthy and factually inaccurate responses. In this paper, we propose HaluSearch, a novel framework that incorporates tree search-based algorithms (e.g., MCTS) to enable an explicit slow thinking generation process for mitigating hallucinations of LLMs during inference. Specifically, HaluSearch frames text generation as a step-by-step reasoning process, using a self-evaluation reward model to score each generation step and guide the tree search towards the most reliable generation pathway for fully exploiting the internal knowledge of LLMs. To balance efficiency and quality, we introduce a hierarchical thinking system switch mechanism inspired by the dual process theory in cognitive science, which dynamically alternates between fast and slow thinking modes at both the instance and step levels, adapting to the complexity of questions and reasoning states. We conduct extensive experiments on both English and Chinese datasets and the results show that our approach significantly outperforms baseline approaches.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (10 more...)
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Gao, Mingqi, Liu, Yixin, Hu, Xinyu, Wan, Xiaojun, Bragg, Jonathan, Cohan, Arman
Evaluating and ranking the capabilities of different LLMs is crucial for understanding their performance and alignment with human preferences. Due to the high cost and time-consuming nature of human evaluations, an automatic LLM bencher (i.e., an automatic evaluation framework that aims to rank LLMs based on their alignment with human preferences) is indispensable. An automatic LLM bencher consists of four components: the input set (e.g., a user instruction), the evaluation model (e.g., an LLM), the evaluation type (e.g., pairwise comparison), and the aggregation method (e.g., the ELO rating system). However, previous work has not thoroughly explored how to select these components or how their different combinations influence the results. In this work, through controlled experiments, we provide a series of recommendations on how to choose each component to better automate the evaluation of LLMs. Furthermore, we discovered that when evaluating LLMs with similar performance, the performance of the automatic LLM bencher declines sharply, underscoring the limitations of current benchers and calling for future work. Lastly, we found that the evaluation models' performance at the instance level (e.g., the accuracy of selecting the best output) does not always align with their effectiveness when used as a component of a bencher, highlighting the importance of dedicated system-level evaluation of benchers.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- (3 more...)
- Energy (0.67)
- Leisure & Entertainment > Games (0.36)
"My life is miserable, have to sign 500 autographs everyday": Exposing Humblebragging, the Brags in Disguise
Naganna, Sharath, Bhattacharjee, Saprativa, Bhattacharyya, Pushpak, Banerjee, Biplab
Humblebragging is a phenomenon where individuals present self-promotional statements under the guise of modesty or complaints. For example, a statement like, "Ugh, I can't believe I got promoted to lead the entire team. So stressful!", subtly highlights an achievement while pretending to be complaining. Detecting humblebragging is important for machines to better understand the nuances of human language, especially in tasks like sentiment analysis and intent recognition. However, this topic has not yet been studied in computational linguistics. For the first time, we introduce the task of automatically detecting humblebragging in text. We formalize the task by proposing a 4-tuple definition of humblebragging and evaluate machine learning, deep learning, and large language models (LLMs) on this task, comparing their performance with humans. We also create and release a dataset called HB24, containing 3,340 humblebrags generated using GPT-4o. Our experiments show that detecting humblebragging is non-trivial, even for humans. Our best model achieves an F1-score of 0.88. This work lays the foundation for further exploration of this nuanced linguistic phenomenon and its integration into broader natural language understanding systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Singapore (0.04)
- (23 more...)
- Research Report (0.64)
- Overview (0.46)